The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
We describe a compression model for semistructured documents, called Structural Contexts Model, which takes advantage of the context information usually implicit in the structure of the text. The idea is to use a separate semiadaptive model to compress the text that lies inside each different structure type (e.g., different XML tag). The intuition behind the idea is that the distribution of all the...
In the world of modern digital libraries, the searching for juridical information of interest is a current and relevant problem. We approach this problem from the perspective that a new searching mechanism, specialized in the juridical area, will work better than standard solutions. We propose a specialized (or vertical) searching mechanism that combines information from a juridical thesaurus with...
This paper explores the use of texts that are related to an image collection, also known as collateral texts, for building thesauri in specialist domains to aid in image retrieval. Corpus linguistic and information extraction methods are used for identifying key terms and semantic relationships in specialist texts that may be used for query expansion purposes. The specialist domain context imposes...
This paper describes an application of genetic algorithms for text summarisation. We have built a sentence extraction algorithm that overcomes some of the drawbacks of traditional sentence extractors, and takes into consideration different features of the summaries. The fitness function can be easily modified in order to incorporate features such as user modelling and adaptation. The system has been...
The rapid growth of the Information Society is increasing the demand for technologies enabling access of multimedia data by content. An interesting example is given by the broadcasting companies and the content providers, which require today effective technologies to support the management and access of their huge audiovisual archives, both for internal use and public services. The Multilingual Natural...
Peer-to-peer (P2P) computing has shown an unexpected growth and development during the recent years. P2P networking is being applied from B2B enterprise solutions to more simple, every-day file-sharing applications like Gnutella clients. In this paper we are investigating the use of the Gnutella P2P protocol for Information Retrieval by means of building and evaluating a general-purpose Web meta-search...
It has been postulated that a method of selecting terms in either routing or filtering using relevance feedback would be to evaluate every possible combination of terms in a training set and determine which combination yields the best retrieval results. Whilst this is not a realistic proposition because of the enormous size of the search space, some heuristics have been developed on the Okapi system...
A star-graph is a conceptual graph that contains a single relation, with some concepts linked to it. They are elementary pieces of information describing combinations of concepts. We use star-graphs as descriptors — or index terms — for image content representation. This allows for relational indexing and expression of complex user needs, in comparison to classical text retrieval, where simple keywords...
The paper addresses the problem of clustering text documents coming from the Web. We apply clustering to support users in interactive browsing through hierarchically organized search results as opposed to the standard ranked-list presentation. We propose a clustering method that is tailored to on-line processing of Web documents and takes into account the time aspect, the particular requirements of...
In this paper we present an initial study on the use of both high and low level MPEG-7 descriptions for video retrieval. A brief survey of current XML indexing techniques shows that an IR-based retrieval method provides a better foundation for retrieval as it satisfies important retrieval criteria such as content ranking and approximate matching. An aggregation technique for XML document retrieval...
Current question answering systems rely on document retrieval as a means of providing documents which are likely to contain an answer to a user’s question. A question answering system heavily depends on the effectiveness of a retrieval system: If a retrieval system fails to find any relevant documents for a question, further processing steps to extract an answer will inevitably fail, too. In this...
Many people in Hungary use the Web to obtain information from public institutions and organizations. Because these users typically do not know the URL of the desired institution’s home page, they use a Web search engine to get there. Institutions’ names are usually difficult to recall exactly, thus they are not being used as queries in search engines. Instead, the acronyms of institutions are being...
In this paper we propose an incremental hierarchical clustering algorithm for on-line event detection. This algorithm is applied to a set of newspaper articles in order to discover the structure of topics and events that they describe. In the first level, articles with a high temporal-semantic similarity are clustered together into events. In the next levels of the hierarchy, these events are successively...
In the field of the biomedical sciences there exists a vast repository of information located within large quantities of research papers. Very often, researchers need to spend considerable amounts of time reading through entire papers before being able to determine whether or not they should be curated (archived). In this paper, we present an automated text classification system for the classification...
In traditional content-based music information retrieval systems, users may face with longer response time, since the traditional systems mostly do syntactic processing to match query melody and whole melodies of the underlying music database. Hence, there has been a growing need for theme melody index that can support to quick retrieve the relevant music to user’s query melody. In this paper, we...
The amount of information available on the web, as well as the number of e-businesses and web shoppers, is growing exponentially. Customers have to spend a lot of time to browse the net in order to find relevant information. One way to overcome this problem is to use dialoguing agents that exploit user profiles to generate personal recommendations. This paper presents a system, designed according...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.